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Abstract 

A recently proposed method (O. Miramontes, P. Rohani, Physica D 166 (2002) 147) 
for estimating the scahng exponent in very short time series may give wrong results, 
especially in case of undersampled data. 
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Estimating the scaling exponent in the power spectrum of a signal is one of 
the most important tasks in the analysis of real-world temporal sequences. 
The point of time series analysis is not making a statement about the signal 
in question, but getting insight of the (unknown) mechanism responsible for 
generating the signal. One may say that the time series is usually not very 
interesting, but its generating mechanism is. The scaling exponent is one of 
the quantities that are calculated in order to get a better understanding of this 
mechanism. Many techniques have been developed for this job; see Ref. [1] for a 
recent review. It is particularly difficult to estimate the scaling exponent if the 
time series is very short, which is frequent for biological, medical and ecological 
data. Indeed, it is not clear whether the notion of the scaling exponent has 
any well-defined meaning in this case. 

Short time series are usually characterized by a large sampling time, T. This is 
easy to understand: The series is short because it takes long time to gather the 
data, and it takes long time to gather the data because the sampling time is 
large and cannot be easily shortened. In a time series only the frequencies up to 
the Nyquist frequency /n = 1/(2T) are present [2]. Any possible power scaling 
that is discovered within the series is necessarily restricted to the Nyquist 
interval [0, /n]. This is all the data can tell — there are no scientific grounds 
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to claim that the scahng extends beyond the Nyquist frequency. Because T is 
large, /n is small. Therefore, attributing the power scaling, if one is detected in 
such a series, to noise is misleading at best, as in a true noise the power scaling 
extends to many (theoretically to infinitely many) orders of magnitude. 

When dealing with a short time series, one is also faced with another, even 
more acute problem: the Nyquist interval is sparsely covered. There is little 
one can tell about the power density at frequencies that are not multiplies of 
l/(2iVT), where N is the lenght of the series, and it is not clear whether the 
very notion of a power scaling can be used in such situations. Any attempts 
to interpolate between frequencies directly accessible form the data merely 
scale down the behaviour observed for these frequencies to those that are not 
directly accessible. This approach may work only if one knows a priori that 
the process used to generate the data is governed by a power law. 

A method published recently in Ref. [3] suffers from all these deficiencies. The 
method, named "multiple segmenting method" (MSM), consists of two steps: 
First, one calculates scahng exponents for each segment of length 2*, s > 3, 
and calculates the average scaling exponents for each s; denote these averages 
by g{n) with n = 2^. In this step pseudo-replicates of the original data are 
created. These pseudo-replicates have the same short-time, or large-frequency, 
correlations as the original series, and the process of replicating increases their 
statistical significance. However, any accidental correlations among the data 
are enhanced as well. Second, the relation 

^(n) = a + 4= (1) 



is fitted to the averages calculated in the first step, and the scaling exponent 
for the whole series is calculated from (1) by using results of the fit and n — N, 
where N is the actual length of the time series. The equation (1) has not been 
derived in any rigorous way, only guessed from the example data. 

The authors of Ref. [3] claim that MSM gives correct results for time series 
as short bs N — Al terms, but offer little proof to this claim. They present 
a number of examples with artificially created time series of moderate length 
= 400, almost an order of magnitude larger than 47. The time series used 
had, by construction, built-in power scaling, and so did all their subseries. In 
these cases the MSM method discovered the scaling exponents that we be- 
forehand knew were present in the data. Two out of four real-world examples 
presented had time series even longer [N — 2000 and N — 1024, respectively), 
where the power-type behaviour could be detected and the corresponding 
scaling exponents determined by conventional methods. There were only two 
really short (A^ = 47) time series of unknown properties discussed. The MSM 
method, when applied to these two series, gave some numbers as the scaling 
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Fig. 1. A quasiperiodic function (2) (thin line) and a time series resulting from 
undersampling the function (boxes). 
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Fig. 2. MSM applied to the data from Fig. 1. The thin hne connects the averages 
for n = 8, 16, 32. 
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Table 1 

Results of MSM method applied to the three example time series: gi{n) — the 
undersampled quasiperiodic function from Fig. 1, g2{n) — the undersampled AR(1) 

process from Fig. 3, 53 (n) — first 47 terms of the full AR(1) series. The last row, 
second column, shows the result of applying the formula (1). 

exponents, but it is not clear whether these numbers are at all meaningful. 

In what follows we present examples of short time series where the MSM 
method, when blindfoldedly applied, gives clearly misleading results. 
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Fig. 3. A realization of the AR(1) process (3) (thin line) and an undersampled 
substring of this signal (boxes). The undersampled series is our second example 
time series, and the first 47 terms of the series used to draw the thin line is the 
third one. 

As a first example, we take a quasiperiodic function 



with ai — 0.8, cui = 5, 02 = 0.6, CJ2 = 20\/3, = —0.8, uj^ = 30\/3, sampled 
with a time step T = 7.9/46 ~ 0.17174 to give N = 47. This time series is 
clearly undersampled, and many of the high-frequency features of the function 
(2) are lost, cf. Fig 1. When we apply the MSM method to this time series, we 
obtain a scaling exponent of about —0.98, cf. Fig. 2 and Table 1. This result, if 
taken at its face value, indicates that the time series is governed by a flicker (or 
pink) noise, which is obviously untrue: the microscopic mechanism responsible 
for this time series is a regular, quasiperiodic function, albeit undersampled. 

We used a realization of the Markovian AR(1) process [4] 



with (3 = 0.69 like in [3], to generate two next example time series, rjn is a 
Gaussian white noise, generated numerically by means of an algorithm pub- 
lished in [5]. We generated 256 terms of the series (3), and then constructed 
our second example time series by taking every fifth term of the original series 
uptoN — 47 (X5, Xio,. . . ,^235). Finally, the first consecutive 47 terms of the 
original series {Xi, X2,. . . ,-^^47) were used as the third example. The output 
of the MSM method is presented in Figs. 4 and 5 and in Table 1. Note that 
the function (1) does not seem to be a reasonable fit to these data, and if this 
fit fails, MSM does not offer any other means to systematically determine the 
scaling exponent for the whole series. For the second example the method in- 
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Fig. 4. Results of the MSM method apphed to the second example time series. The 
thin line connects the averages for n = 8, 16, 32. 
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Fig. 5. Results of the MSM method applied to the third example time scries (47 
terms). The thin line connects the averages for n = 8, 16, 32. The inset shows the 
power spectrum calculated from the whole 256 terms used to generate the data. 
The line with the slope —1.5 is not a rigorous fit — it is meant as a guide to the 
eye only. 

dicates the scaling exponent in the range [—0.33, —0.16], i.e. it recognizes the 
time series as being close to the white noise, which is again not true. On the 
other hand, for the third example the method predicts the scaling exponent 
in the range [—1.52, —1.05], which is close (especially the lowest value) to a 
rough estimate for the scaling exponent for the whole 256 term time series. 
Note that this time series was, by construction, characterized by a power-type 
behaviour. The relative success of MSM in this particular case strengthens our 
point that if a power law is present in the data, scaling it down to frequencies 
not observed in the data may not hurt much, but a short time series does not 
tell whether such a power law is present. 



One may argue that the MSM method failed for the undersampled time series 
only. However, undersampling can be quite common in practice. When dealing 
with real-world time series generated by a mechanism whose details remain 
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unknown, we never know whether we choose a correct samphng time. For 
instance, in the bacterial population data analysed in Ref. [3] the sampling 
time was as long as a week, while the true dynamics of the population might 
have been governed by processes with a much lower characteristic time, like 
daily changes in sunlight or temperature. Similarly, in many medical studies 
levels of drugs in subjects are determined on a daily basis, while physiological 
processes responsible for the drug absorption and decay usually proceed much 
faster. There are many other situations in which data are collected not in 
a controlled laboratory environment, but come out of real-world processes, 
where adjusting the sampling time may be impossible for various "technical" 
reasons. In case of undersamplig, the power spectrum beyond the Nyquist 
frequency is aliased into the Nyquist interval, distorting the available power 
spectrum [2]. Examples presented above show that the MSM method, when 
applied to such data, may give wildly wrong results. 

One needs to know some important details of the underlying process in order to 
estimate its parameters from a very short time series. Otherwise idiosyncrasies 
of the data are mistakenly taken for true correlations and are enhanced in the 
process of pseudo-rephcating the time series. This may lead to a failure of 
the MSM method. This method appears to work tolerably well for long time 
scries, but in this case it is inferior to conventional techniques due to its large 
computational costs. In our opinion, the usefulness of the MSM method is 
thus limited to the rare case when one knows for sure that the process used to 
generate the time series is governed by a power law with an unknown exponent, 
but the time series available is very short and cannot be easily extended and 
only a rough estimate of the scaling exponent is required. In other situations 
the MSM method is not recommended. 

I would like to thank Dr. Adam Kleczkowski from Cambridge University for 
his helpful comments. 
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